Using Bags of Symbols for Automatic Indexing of Graphical Document Image Databases
Identifieur interne : 000F90 ( Main/Exploration ); précédent : 000F89; suivant : 000F91Using Bags of Symbols for Automatic Indexing of Graphical Document Image Databases
Auteurs : Eugen Barbu [France] ; Pierre Héroux [France] ; Sébastien Adam [France] ; Éric Trupin [France]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2006.
Abstract
Abstract: A database is only usefull if it is associated a set of procedures allowing to retrieve relevant elements for the users’ needs. A lot of IR techniques have been developed for automatic indexing and retrieval in document databases. Most of these use indexes depending on the textual content of documents, and very few are able to handle graphical or image content without human annotation. This paper describes an approach similar to the bag of words technique for automatic indexing of graphical document image databases and different ways to consequently query these databases. In an unsupervised manner, this approach proposes a set of automatically discovered symbols that can be combined with logical operators to build queries.
Url:
DOI: 10.1007/11767978_18
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 003167
- to stream Istex, to step Curation: 002F26
- to stream Istex, to step Checkpoint: 000963
- to stream Main, to step Merge: 001007
- to stream Main, to step Curation: 000F90
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Using Bags of Symbols for Automatic Indexing of Graphical Document Image Databases</title>
<author><name sortKey="Barbu, Eugen" sort="Barbu, Eugen" uniqKey="Barbu E" first="Eugen" last="Barbu">Eugen Barbu</name>
</author>
<author><name sortKey="Heroux, Pierre" sort="Heroux, Pierre" uniqKey="Heroux P" first="Pierre" last="Héroux">Pierre Héroux</name>
</author>
<author><name sortKey="Adam, Sebastien" sort="Adam, Sebastien" uniqKey="Adam S" first="Sébastien" last="Adam">Sébastien Adam</name>
</author>
<author><name sortKey="Trupin, Eric" sort="Trupin, Eric" uniqKey="Trupin E" first="Éric" last="Trupin">Éric Trupin</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:03C7856EF4395B19A1389C74DF36265ED92B3436</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1007/11767978_18</idno>
<idno type="url">https://api.istex.fr/document/03C7856EF4395B19A1389C74DF36265ED92B3436/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">003167</idno>
<idno type="wicri:Area/Istex/Curation">002F26</idno>
<idno type="wicri:Area/Istex/Checkpoint">000963</idno>
<idno type="wicri:doubleKey">0302-9743:2006:Barbu E:using:bags:of</idno>
<idno type="wicri:Area/Main/Merge">001007</idno>
<idno type="wicri:Area/Main/Curation">000F90</idno>
<idno type="wicri:Area/Main/Exploration">000F90</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Using Bags of Symbols for Automatic Indexing of Graphical Document Image Databases</title>
<author><name sortKey="Barbu, Eugen" sort="Barbu, Eugen" uniqKey="Barbu E" first="Eugen" last="Barbu">Eugen Barbu</name>
<affiliation wicri:level="4"><country xml:lang="fr">France</country>
<wicri:regionArea>LITIS, Université de Rouen, F-76800, Saint-Etienne du Rouvray</wicri:regionArea>
<placeName><region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
<settlement type="city">Saint-Etienne du Rouvray</settlement>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">France</country>
</affiliation>
</author>
<author><name sortKey="Heroux, Pierre" sort="Heroux, Pierre" uniqKey="Heroux P" first="Pierre" last="Héroux">Pierre Héroux</name>
<affiliation wicri:level="4"><country xml:lang="fr">France</country>
<wicri:regionArea>LITIS, Université de Rouen, F-76800, Saint-Etienne du Rouvray</wicri:regionArea>
<placeName><region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
<settlement type="city">Saint-Etienne du Rouvray</settlement>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
<author><name sortKey="Adam, Sebastien" sort="Adam, Sebastien" uniqKey="Adam S" first="Sébastien" last="Adam">Sébastien Adam</name>
<affiliation wicri:level="4"><country xml:lang="fr">France</country>
<wicri:regionArea>LITIS, Université de Rouen, F-76800, Saint-Etienne du Rouvray</wicri:regionArea>
<placeName><region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
<settlement type="city">Saint-Etienne du Rouvray</settlement>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
<author><name sortKey="Trupin, Eric" sort="Trupin, Eric" uniqKey="Trupin E" first="Éric" last="Trupin">Éric Trupin</name>
<affiliation wicri:level="4"><country xml:lang="fr">France</country>
<wicri:regionArea>LITIS, Université de Rouen, F-76800, Saint-Etienne du Rouvray</wicri:regionArea>
<placeName><region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
<settlement type="city">Saint-Etienne du Rouvray</settlement>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2006</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">03C7856EF4395B19A1389C74DF36265ED92B3436</idno>
<idno type="DOI">10.1007/11767978_18</idno>
<idno type="ChapterID">18</idno>
<idno type="ChapterID">Chap18</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: A database is only usefull if it is associated a set of procedures allowing to retrieve relevant elements for the users’ needs. A lot of IR techniques have been developed for automatic indexing and retrieval in document databases. Most of these use indexes depending on the textual content of documents, and very few are able to handle graphical or image content without human annotation. This paper describes an approach similar to the bag of words technique for automatic indexing of graphical document image databases and different ways to consequently query these databases. In an unsupervised manner, this approach proposes a set of automatically discovered symbols that can be combined with logical operators to build queries.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Haute-Normandie</li>
<li>Région Normandie</li>
</region>
<settlement><li>Saint-Etienne du Rouvray</li>
</settlement>
<orgName><li>Université de Rouen</li>
</orgName>
</list>
<tree><country name="France"><region name="Région Normandie"><name sortKey="Barbu, Eugen" sort="Barbu, Eugen" uniqKey="Barbu E" first="Eugen" last="Barbu">Eugen Barbu</name>
</region>
<name sortKey="Adam, Sebastien" sort="Adam, Sebastien" uniqKey="Adam S" first="Sébastien" last="Adam">Sébastien Adam</name>
<name sortKey="Barbu, Eugen" sort="Barbu, Eugen" uniqKey="Barbu E" first="Eugen" last="Barbu">Eugen Barbu</name>
<name sortKey="Heroux, Pierre" sort="Heroux, Pierre" uniqKey="Heroux P" first="Pierre" last="Héroux">Pierre Héroux</name>
<name sortKey="Trupin, Eric" sort="Trupin, Eric" uniqKey="Trupin E" first="Éric" last="Trupin">Éric Trupin</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F90 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000F90 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:03C7856EF4395B19A1389C74DF36265ED92B3436 |texte= Using Bags of Symbols for Automatic Indexing of Graphical Document Image Databases }}
This area was generated with Dilib version V0.6.32. |